Advances Towards Data-Race-Free Cache Coherence Through Data Classification
نویسنده
چکیده
Davari, M. 2017. Advances Towards Data-Race-Free Cache Coherence Through Data Classification. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1521. 64 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-554-9925-9. Providing a consistent view of the shared memory based on precise and well-defined semantics —memory consistency model—has been an enabling factor in the widespread acceptance and commercial success of shared-memory architectures. Moreover, cache coherence protocols have been employed by the hardware to remove from the programmers the burden of dealing with the memory inconsistency that emerges in the presence of the private caches. The principle behind all such cache coherence protocols is to guarantee that consistent values are read from the private caches at all times. In its most stringent form, a cache coherence protocol eagerly enforces two invariants before each data modification: i) no other core has a copy of the data in its private caches, and ii) all other cores know where to receive the consistent data should they need the data later. Nevertheless, by partly transferring the responsibility for maintaining those invariants to the programmers, commercial multicores have adopted weaker memory consistency models, namely the Total Store Order (TSO), in order to optimize the performance for more common cases. Moreover, memory models with more relaxed invariants have been proposed based on the observation that more and more software is written in compliance with the Data-Race-Free (DRF) semantics. The semantics of DRF software can be leveraged by the hardware to infer when data in the private caches might be inconsistent. As a result, hardware ignores the inconsistent data and retrieves the consistent data from the shared memory. DRF semantics therefore removes from the hardware the burden of eagerly enforcing the strong consistency invariants before each data modification. Instead, consistency is guaranteed only when needed. This results in manifold optimizations, such as reducing the energy consumption and improving the performance and scalability. The efficiency of detecting and discarding the inconsistent data is an important factor affecting the efficiency of such coherence protocols. For instance, discarding the consistent data does not affect the correctness, but results in performance loss and increased energy consumption. In this thesis we show how data classification can be leveraged as an effective tool to simplify the cache coherence based on the DRF semantics. In particular, we introduce simple but efficient hardware-based private/shared data classification techniques that can be used to efficiently detect the inconsistent data, thus enabling low-overhead and scalable cache coherence solutions based on the DRF semantics.
منابع مشابه
Scope-Aware Classification: Taking the Hierarchical Private/Shared Data Classification to the Next Level
Hierarchical techniques are commonplace in ameliorating the bottlenecks, such as cache coherence, in the design of scalable multi/manycores. Furthermore, there have been proposals to simplify the coherence based on the data-race-free semantics of the software and private/shared data classification, where cores self-invalidate their shared data upon synchronizations. However, naive private/share...
متن کاملVIPS: Simple Directory-Less Broadcast-Less Cache Coherence Protocol
Coherence in multicores introduces complexity and overhead (directory, state bits) in exchange for local caching, while being “invisible” to the memory consistency model. In this paper we show that a much simpler (directory-less/broadcast-less) multicore coherence provides almost the same performance without the complexity and overhead of a directory protocol. Motivated by recent efforts to sim...
متن کاملNon-Strict Cache Coherence: Exploiting Data-Race Tolerance in Emerging Applications
Software distributed shared memory (DSM) platforms on networks of workstations tolerate large network latencies by employing one of several weak memory consistency models. Data-race tolerant applications, such as Genetic Algorithms (GAs), Probabilistic Inference, etc., offer an additional degree of freedom to tolerate network latency: they do not synchronize shared memory references, and behave...
متن کاملCoherence in the CMP ERA: Lesson learned in designing a LLC architecture
Designing an efficient memory system is a big challenge for future multicore systems. In particular, multicore systems increase the number of requests towards the memory systems, so the design of efficient on-chip caches is crucial to achieve adequate level of performance. Solutions based on conventional, big sized cache may be improved due to wire delay effects, so NUCA and D-NUCA cache may re...
متن کاملAccelerating Data Race Detection with Minimal Hardware Support
We propose a high performance hybrid hardware/software solution to race detection that uses minimal hardware support. This hardware extension consists of a single extra instruction, StateChk, that simply returns the coherence state of a cache block without requiring any complex traps to handlers. To leverage this support, we propose a new algorithm for race detection. This detection algorithm u...
متن کامل